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DATA STRUCTURE FOR RANGE-SPECIFIED ALGORITHMS 
Field Of The Invention 

[0001] This invention relates to computer-based communications systems and more 
5 particularly to data structures that represent sets of intervals for use in range- 
specified calculations for such systems. 

Background Of The Invention 

[0002] In the general field of computer-based systems, involving multiple and 
10 varied work stations located at multiple and diverse sites providing services of 
differing classification the procedures for controlling data flow are known to be 
extremely complex. 

[0003] Typically, data flow is governed by sets of rules which dictate, amongst 
15 other things, quality of service, security and metering. The data usually is in the 
form of packets with a header having particular information such as source, 
destination and service related criteria. The manner in which the data is handled 
in the system involves examining the header in relation to these sets of rules* In 
such an environment, range specified rules become the most viable option to 
20 provide an acceptable level of control. This is largely due to the fact that it would 
be practically impossible to compare all of the data with the sets of rules in a high 
speed system. Range-specified rules, according to the following description, can be 
broadly described as a set of rules defined using intervals (or ranges) for each field. 
The fields can be defined arbitrarily, depending on the applications. A typical 
25 example for the fields is the 5-tuple (IP source address, IP destination address, TCP 
protocol, source port, destination port), but any arrangement of any number of 
fields is possible, as long as the corresponding data exists in the packets (header 
and payload) which are going to be matched. The ranges can be thought of as 
integer intervals, but the invention applies to any set of ordered values on which 
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the concept of interval can be defined. A packet matches a rule if each of its fields 
(as extracted, or parsed, from the packet) is contained within the corresponding 
ranges of the rule. If the rule set is ordered top-down, then the best matching rule 
for a packet is the matching rule closest to the top. 

5 

[0004] An example of the environment contemplated by the present invention is in 
an IP router which processes a large number of packets coming from and destined 
for a large number of user sites. To provide service to end users that is better than 
"best effort" the system needs to strictly adhere to sets of rules which dictate how 
10 data packets are processed. Obviously, taking into consideration the vast volume 
of traffic in communications systems such as the Internet, it would be difficult to 
compare each packet with a rule and to determine whether it meets established 
criteria. Thus the aforementioned range based algorithm can be applied. 

15 [0005] For the sake of the following description reference is made to methods of 
implementing algorithms for multi-field packet classification by range specified 
rules used in IP routers. Classification is a very important function that is a part of 
applications such as firewall, Ipsec and Quality of service. Firewall needs to 
classify packets based on pre-defined set of rules so that it can filter /block some 

20 flow packets from entering the network. IPsec needs to classify packets based on 
rules so that specific flow packets can be matched to the corresponding security 
policy and associations that indicate the security algorithms, and secure keys can 
be applied to the flow packets. Quality of service needs to perform classification 
function on packets, so that Quality of service attributes like delay bounds, packet 

25 loss bounds and bandwidth can be associated with the flow packets. In VPN 

environments, all the three applications of firewall, IPsec, and Quality of service 
may have to be applied to the edge router device. Hence, efficient implementation 
of the classification function becomes even more vital in such environments. 
Improved classification algorithms can guarantee high-performance with reduced 
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resource requirements for implementation. The capacity of the existing application 
of the packet classification algorithms can be gracefully enlarged in terms of the 
computational resources. It will be apparent to one skilled in the art, however, that 
the algorithms can also apply to other calculations where a range-based or 
5 specified subset of rules is used. 

[0006] As used in this application packet classification is the process of categorizing 
packets into "flows' 7 in an Internet router based on one or more fields in the packet 
header. All packets belonging to the same flow obey a predefined rule and are 
10 processed in a similar manner by the router. This classification process is used in 
ACLs (Access Control Lists) for security, QoS, or for Metering for instance. 

[0007] An algorithm for Multi-field packet classification by range specification 
takes a rule set and a packet as inputs, finds the best matching rule for the packet 
15 in the rule set based on the values of multiple fields in the packet header. A rule set 
consists of a finite number of rules. Each rule in the rule set contains multiple fields 
specified by ranges, where a range is an integer interval with a lower bound and an 
upper bound. Each rule also has a rule number. 

20 [0008] A single field of a given rule set is a set of integer intervals. Given a set of 
integer intervals, a set of elementary intervals and a set of disjoint intervals can be 
obtained. The elementary intervals break down the set of integer intervals into 
smaller but necessary elements that are non-overlapping, while the disjoint 
intervals combine the overlapping integer intervals in the set of integer intervals 

25 together to form larger integer intervals that are disjoint to each other. 



[0009] A matching rule for a given packet satisfies the principle that the value of 
each field of the packet falls into the value range of the corresponding field of the 
rule. The best matching rule is the matching rule with the smallest rule number 



4 



among all the matching rules in the rule set, given the convention that rules are 
numbered from the highest priority to the lowest priority. 

[0010] The elementary interval, in addition to its application for packet 
5 classification as described above also supports stabbing query. Stabbing query is 
the type of query where a point data is queried against a set of intervals to 
determine which of those intervals contains the point. Stabbing query may be used 
for certain applications, such as IP routing. Data structures for stabbing queries can 
also be extended to multi-dimension to serve for packet classification in IP routers. 
10 As previously described, a packet classification algorithm performs multi- 
dimensional point query against a set of rules, where the point is multi- 
dimensional and each rule consists of multiple intervals (ranges). 

[0011] Given a set of intervals and a point of the stabbing query, the Elementary 
15 Interval Tree is used to represent the set of intervals according to one aspect of the 
invention. Also discussed herein is the Elementary Interval Tree Construction 
algorithm used to construct the data structure, and the Elementary Interval Tree 
Query algorithm to perform stabbing query on the data structure. 

20 [0012] According to this aspect, given an interval [1, u] with two endpoints: the 

lower endpoint 1 and the upper endpoint u, the interval contains a point p if 1 < p < 
u. A set of intervals contains a finite number of intervals, where each interval has 
an identifier. Given a set of intervals, by projecting the endpoints of each interval to 
a line, the endpoints divide the line into small partitions, called elementary 

25 intervals. The elementary intervals break down the set of intervals into smaller but 
necessary elements that are non-overlapping. The elementary interval tree 
proposed in the invention is an augmented binary search tree that stores each of 
the elementary intervals in one node to represent a set of intervals. 



3 



[0013] Also contemplated by the present invention is the design of a data structure 
that represents a set of intervals to find maximum disjoint intervals for the set of 
intervals. Again, this data structure finds application for packet classification in IP 
routers. 

5 [0014] In this aspect an interval [1, u] has two endpoints: lower endpoint I and 
upper endpoint u. Two intervals [/„ and [l 2 , u 2 ] overlap if [/,, uj n [l 2 , u 2 ] * 0. A 
set of intervals contains a finite number of intervals, where each interval also has 
an identifier. Given a set of intervals, / - {I l9 I 2 ,..J n } , the set of disjoint intervals of 

/ is defined as {I x J 2y ...J L } , 

10 1. I x u/ 2 u...u/ n =i { Kji 2 u...u/ L ; 

2. Vi a J b ,a±b,i a ni b =0; 

3. V/„ / z ^u.-Aj/*, /; 6 {/l/j,-..,/.}/ 1<A:<^; 

4. V/ f/ 3/;,/, c/ a ,V/;, CX/V 



15 [0015] The disjoint intervals combine the overlapping intervals in the set of 
intervals together to form larger intervals that are disjoint to each other. 

[0016] This data structure could be used to facilitate intersection query as well as 
stabbing query. Given a set of intervals, the intersection query is to determine 
20 which of those intervals overlap a given interval, while the stabbing query is to 
determine which of those intervals overlap a given point. 

[0017] Intersection query and stabbing query are important for certain applications, 
such as IP routing. The data structure proposed here could also be used to facilitate 
25 multi-dimensional domains problem, such as packet classification used in IP 
routers. Packet classification algorithm performs point query against a set of rules, 
where the point is multi-dimensional and each rule consists of multiple intervals 
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(ranges). Intersection query and stabbing query are also useful for computer 
graphics, large knowledge-based systems, and some computational geometric 
problems. 

5 [0018] In this aspect, given a set of intervals, the Disjoint Interval Tree represents 
the set of intervals to facilitate intersection query, stabbing query and packet 
classification. Also, the Disjoint Interval Tree Construction algorithm is used to 
construct a disjoint interval tree, and the Disjoint Interval Tree Point Query 
algorithm is used to perform stabbing query, and the Disjoint Interval Tree Interval 

10 Query algorithm is used to perform intersection query. 

[0019] Prior solutions for indexing intervals to support intersection query and 
stabbing query include segment tree, interval tree, priority search tree, interval 
binary search tree, point-range tree, etc. However, no solution has ever been 
15 proposed to find disjoint intervals for a given set of intervals. 

Prior Art 

[0020] The prior art solutions respecting the disjoint graph aspect include FIS (Fat 
Inverted Segment) tree based classification algorithm, Ternary Content 
20 Addressable Memory (TCAM) implementation, and classical prefix-based 
classification algorithms. 

[0021] The FIS trees based classification algorithm for range-specified rules is one 
of the prior art solution. The FIS trees for multiple fields of a given rule set is 
25 recursively constructed based on the FIS tree for a single field of a give rule set. 

[0022] The FIS tree is a tree-like data structure to represent a set of integer intervals. 
The leaf nodes of the FIS tree store the elementary intervals of the set of integer 
intervals, and all the other nodes beside leaf nodes store the integer interval with 
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the smallest lower bound and the maximum upper bound of all the integer 
intervals stored in their children. As opposed to the binary tree, the edges in the 
FIS tree point from child nodes to parent nodes. 

[0023] Given a rule set based on D fields, the overall FIS trees is a tree containing D 
5 layers of F^FIS trees with one F^FIS tree in the first layer and a set of F^FIS trees in 
the j-th layer, where F^FIS tree is a modified FIS tree to represent the set of integer 
intervals belonging to j-th field of a rule set that each node in the F^FIS tree has an 
associated rule set. The overall D-layers FIS trees is recursively built by 
constructing D layer FIS trees. The associated rule set of a node contains rules 
10 whose j-th field contains the integer interval stored in the node but does not 
contain the integer interval stored in the parent node. Except for the first layer, the 
Fj-FIS trees represents the integer intervals in the j-th field of the associated rule 
sets of the nodes in the F H -FIS trees. To find the best matching rule for a packet in 
the overall FIS trees, multiple traversals toward all the possible nodes are required. 

15 

[0024] Another prior art solution is that of TCAM (Ternary Content-Addressable 
Memory). TCAM is a specialized hardware that allows parallel pattern matching. 
TCAM memory arrays store the rules in decreasing order of priorities and compare 
the input key (packet field) against every element in the array in parallel. The 

20 highest priority rule that is matched to the key is returned. TCAMs are faster than 
software algorithms, but due to the parallel hardware, the magnitude of TCAM 
power consumption is multiple times higher than that of comparable SRAM based 
software solution. In comparison the graph based classification approaches are 
software solutions and rely on graph traversals to find the match to the input key. 

25 However, some methods explore the middle ground e.g. work done involves 
having smaller sized hardware (than TCAM) to do the parallel rule evaluation. 
These methods employ heuristic algorithms that divide the rule-sets across these 
hardware units. 
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[0025] Using classical prefix-based classification algorithms is another prior art 
solution. Through expansion of ranges to replace them with prefixes, classical 
prefix-based solutions, such as hierarchical-tries based classification algorithm, set- 
pruning tries based classification algorithm, could also be used to solve the range- 
5 based classification problem. 

[0026] Given a rule set and a packet as inputs, the FIS trees based classification 
algorithm transforms the rule set into an overall FIS tree and finds the best 
matching rule for the packet on the overall FIS trees (Figure 2). The problem in FIS 

10 trees based classification algorithm is that multiple traversals are required on the 
overall FIS trees toward all potential nodes containing the best matching rule 
during search. Multiple traversals are required because, except for the leaf nodes, a 
node in the overall FIS trees contains the integer intervals of all its children, thus, 
when a packet falls into the integer interval stored in one node, the search has to be 

15 performed on its parents, its parents' parents, and so on. A node can have only one 
parent in its F-FIS, but it can have another parent in the next layer F-FIS (i.e. F. +1 - 
FIS). Multiple parents cause multiple paths to be explored. 

[0027] To overcome this disadvantage, the Disjoint Graph based Classification 
20 Algorithm for range-specified rules according to the present invention, is 
implemented to allow that only a single path be traversed when performing 
classification for a packet. The disjoint graph based classification algorithm not 
only reduces the searching time required by the FIS trees based algorithm, but also 
requires less memory storage and data structure setup time than the FIS trees 
25 based algorithm. 

[0028] The closest prior art with respect to the aforementioned elementary interval 
tree is the Point-Range tree. 



[0029] The Point-Range Tree (PR-Tree) is an augmented Binary Search Tree to 
represent a set of intervals. The PR-Tree (Figure 6) contains two types of nodes: 
Point nodes and Range nodes. All Point nodes are internal nodes and each Point 
node has Value, Left, Right, Equal and Ownedby fields. Value is one endpoint of 

5 an interval, Left (Right) is a pointer to the left (right) subtree holding values less 
than (greater than) Value, Equal contains a list of identifiers of intervals that 
contain Value, and Ownedby contains a list of identifiers of intervals that have 
Value as an endpoint. All Range nodes are leaf nodes and each Range node has 
Valuel, Value2, and Equal fields. Valuel and Value2 are both an endpoint of an 

10 interval, and Equal contains a list of identifiers of intervals that contain the open 
interval (Valuel, Value2). 

[0030] The PR-Tree allows dynamic insertions and deletions, and could maintain 
itself balanced by any balanced Binary Tree scheme. A balanced PR-Tree takes 
15 0(log n) time for search. Insertion, deletion, and storage space have worst case 
requirements of 0(nlogn + m), 0(nlog 2 n+m), and O(nlogn), respectively, where n is 
the total number of intervals in the tree, and m is the number of nodes visited 
during insertion and deletion. 

20 [0031] Multiway range search is another solution that uses B-tree to represent a set 
of intervals, where each node in the B-tree other than the root has k keys and /c+ 1 
subtrees and the endpoints of the set of intervals are stored as keys in the nodes of 
the B-tree. The B-tree data structure requires a linear search within each node to 
find the corresponding subtree. 

25 

[0032] Prior to PR-Tree, data structures such as Segment Tree, Interval Binary 
Search Tree, were also proposed to support stabbing query. 
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[0033] Given a set of intervals and a point, the PR-Tree based algorithm transforms 
the set of intervals to a PR-Tree and finds all intervals that contain the point on the 
PR-Tree. The problem in PR-Tree is that PR-Tree stores duplicated information that 
each elementary interval is stored twice: each of the endpoints is stored in a Point 
5 node and each elementary interval is stored as an open interval in a Range node. 
Both Point node and Range node have a list of identifiers of intervals associated 
with the nodes. 

[0034] Multiway range search uses B-tree to represent a set of intervals. The B-tree 
10 data structure requires a linear search within each node to find the corresponding 
subtree. 

[0035] The Elementary Interval Tree of the present invention reduces the memory 
storage required by PR-Tree by storing each elementary interval only once, which 
15 also reduces the insertion and deletion time correspondingly compared to 
Multiway range search. 

[0036] The closest prior solution for intersection query is the Interval Tree. An 
interval tree is an augmented red-black tree that stores each of the intervals in one 
20 node to represent a set of intervals. Each node also stores the maximum value of 
any interval endpoint stored in the subtree rooted at the node. 

[0037] The Interval Tree allows dynamic insertion and deletion. Both insertion and 
deletion can be performed in 0(log n) time on an interval tree of n nodes. The 
25 storage space is 0(n) since the interval tree stores each interval exactly once in the 
tree. The search time is 0(log n) to find one of the intervals that overlap a given 
interval. But multiple traversals are required to find all intervals that overlap a 
given interval. 



ii 



[0038] The closest prior art solution for multi-dimensional domain problem is the 
method proposed in "Method and system for performing interval-based testing of 
filter rules" issued in March 25, 2003, U.S. Patent No. 6,539,394. The method 
disclosed transforms a set of intervals to a set of prefixes, and then constructs a 
5 decision tree based on the set of prefixes. 

[0039] According to the 6,539,394 patent, given a set of intervals and an interval of 
intersection query, the Interval Tree based algorithm transforms the set of intervals 
to an Interval Tree and finds one of intervals that overlap the given interval. The 
10 problem with the Interval Tree solution is that multiple traversals are required by 
the Interval Tree to find all intervals that overlap the given interval. The Interval 
Tree can not be extended to multi-dimensional domains to support packet 
classification. 

15 [0040] Given a set of intervals and a point, the PR-Tree based algorithm transforms 
the set of intervals to a PR-Tree and finds all intervals that contains the given point. 
The problem in PR-Tree is that PR-Tree stores duplicated information that each 
open interval stored in the Range node has each of its endpoints being stored in a 
Point node. The PR-Tree could be extended to support multiple dimensional 

20 problems, but need to consume too much memory storage. 

[0041] The method proposed in U.S. Patent No. 6,539,394 is a static algorithm that 
needs to reconstruct the decision tree when inserts or deletes of an interval are 
made to the interval set. Also, it needs large preprocessing time to construct the 
25 decision tree. 

[0042] The Disjoint Interval Tree of the present invention can be used to construct a 
data structure that requires only a single path traversal to find all intervals that 



overlap a given interval and requires only half of the storage space compared to 
PR-Tree. 

Summary of the Invention 

5 [0043] According to one aspect of the invention a new tree-like structure is created. 
The tree-like structure, known as a disjoint graph, enables packet classification in 
only one pass of the tree. 

[0044] The disjoint graph is comprised of two new types of data structures: an 
10 elementary interval tree (EIT) and a disjoint interval (DIT). The disjoint graph is 
constructed based on a range-specified rule set for classifying packets. Each rule in 
the rule set has an equal number of fields, D, and each field specifies a range, 
referred to as an integer interval, having a lower and an upper bound. The disjoint 
graph has the same number of layers, D, as there are fields in each rule. The layers 
15 are comprised of nodes, and each node has an associated rule set selected from the 
original (range-specified) rule set. 

[0045] The first layer of the disjoint graph is an EIT. The remaining layers comprise 
a set of DITs and a set of EITs. The set of DITs at a given layer are constructed for 

20 the integer intervals stored in each node of the EITs in the preceding layer. The set 
of EITs at a given layer are constructed for the integer intervals stored in each node 
of the DITs of that layer. The associated rule set of a node of an EIT in a j-th layer 
contains rules whose j-th field contains the elementary interval stored in the node. 
The associated rule set of a node of a DIT in a j-th layer contains rules whose j-th 

25 field is contained by the disjoint interval stored in the node. 

[0046] Elementary intervals are non-overlapping integer intervals. Disjoint 
intervals are intervals formed from overlapping integer intervals by combining 
them to form integer intervals that are disjoint from each other. 
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[0047] In accordance with a first aspect of the present invention there is provided a 
method of creating a tree-like data structure for use in carrying out range specified 
rule evaluations, the data structure having a rule specified rule set where each rule 

5 in the rule set has an equal number of fields and each field specifies a range having 
an upper and lower bound, there being the same number of layers in the structure 
as there are fields in each rule set, the method comprising: creating a first layer of 
the structure made up of a set of non-overlapping ranges; and creating one or more 
additional layers each made up of sets of non-overlapping ranges and sets of 

10 overlapping ranges; wherein range specified rule evaluations are carried out by 
one pass through the data structure. 

[0048] In accordance with a second aspect of the present invention there is 
provided a method of creating an augmented binary tree structure from a range 

15 specified rule set , each rule in the rule set having an equal number of fields and 
each field specifying a range having an upper and lower bound forming a set of 
intervals, the method comprising: projecting end points of each interval of the set 
of intervals onto a line, the end points dividing the line into non-overlapping 
elementary intervals; and forming the tree structure such that each node of the tree 

20 contains a single elementary interval, an indication of original intervals associated 
with the elementary interval, and pointers to any adjacent nodes in the tree. 

[0049] In accordance with a further aspect of the present invention there is 
provided a method of creating a disjoint interval tree from a range specified rule 
25 set each rule in the rule set having an equal number of fields and each field 

specifying a range having an upper and lower bound forming a set of intervals, the 
method comprising: combining overlapping intervals of the set of intervals to form 
larger intervals that are disjoint to each other; and evaluating the overlapping 
intervals to find the maximum disjoint intervals for the set of intervals. 
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[0050] Prior solutions for indexing intervals to support intersection query and 
stabbing query include segment tree, interval tree, priority search tree, interval 
binary search tree, point-range tree, etc. However, no solution has ever been 
5 proposed to find disjoint intervals for a given set of intervals. 

Brief Description Of The Drawings 

[0051] The invention will now be described in greater detail with reference to the 
attached drawings wherein: 

10 

[0052] Figure 1 illustrates a basic rule set with five rules, each rule having three 
fields; 

[0053] Figure 2 shows a FIS tree built for the rule set of Figure 1; 

15 

[0054] Figure 3 illustrates the construction of DITs and EITs; 
[0055] Figure 4 shows a disjoint graph constructed for the rule set of Figure 1; 
20 [0056] Figure 5 shows an interval set S with three intervals; 

[0057] Figure 6 illustrates a PR-Tree built for the set of Figure 5; 
[0058] Figure 7 illustrates the EIT built for the set of Figure 5; 

25 

[0059] Figure 8 illustrates an interval set S with five intervals; 
[0060] Figure 9 shows an interval Tree built for the set of Figure 8; 
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[0061] Figure 10 is a PR-Tree built for the set of Figure 8; 
[0062] Figure 11 is a decision tree built for the set of Figure 8; 
5 [0063] Figure 12a is a DIT for the set of Figure 8; and 
[0064] Figure 12b is an EIT for the set of Figure 8. 

Detailed Description of the Invention 

10 [0065] In accordance with the present invention, given a rule set and a packet, a 
Disjoint Graph based Classification Algorithm is presented. The algorithm 
includes the Disjoint Graph to represent the rule set to support packet 
classification, the Disjoint Graph Construction algorithm to transform the rule set 
into a disjoint graph, and the Disjoint Graph Search algorithm to find the best 

15 matching rule for the packet on the disjoint graph. 

[0066] The Disjoint Graph data structure for a given rule set with D fields in each 
rule has D layers. Each node in the disjoint graph has an associated rule set. The 
first layer of the disjoint graph is an elementary interval tree (EIT) constructed for 

20 the set of integer intervals belonging to the first field of rules in the rule set. Besides 
the first layer, the j-th layer of the disjoint graph consists of a set of disjoint interval 
trees (F^DITs) and a set of elementary interval trees (F^EITs), The set of F^DITs are 
constructed for the integer intervals stored in each node of the F^-EITs in the (j-1)- 
th layer. The set of F f -EITs are constructed for the integer intervals stored in each 

25 node of the F^DITs in the ;-th layer. 

[0067] The disjoint graph is constructed based on two structures: Elementary 
Interval Tree (EIT) and Disjoint Interval Tree (DIT). Given a set of integer intervals, 
its elementary intervals and disjoint intervals can be represented by trees, which 
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are called the elementary interval tree and disjoint interval tree. Each node in EIT 
(DIT) stores one of the elementary (disjoint) intervals of the set of integer intervals. 
The components of the disjoint graph, F f -EIT and F f DIT enhance the EIT and DIT 
by setting an associated rule set (ARS) to each node of EIT and DIT. The associated 
5 rule set of a node in F f -EIT contains rules whose ;-th field contains the elementary 
interval stored in the node, while the associated rule set of a node in F f -DIT 
contains rules whose ;-th field is contained by the disjoint interval stored in the 
node. 

10 [0068] The EIT component alone would be enough to construct a data structure 
that satisfies the requirement of a single path to be traversed to find the best 
matching rule for a packet by constructing EITs for the associated rule set of the 
nodes of the constructed EIT until no more EIT can be constructed. However, 
duplicated sub-EITs will be constructed in such data structure when the associated 

15 rule sets of the nodes in one EIT are overlapping with each other. These duplicated 
sub-EITs are redundant and should be shared to save storage space for the data 
structure. Unfortunately, duplicated sub-EITs may not be shared by two EITs if the 
sub-EIT is in the "middle" of an EIT. Thus, DITs are constructed to enable the 
sharing of the duplicated sub-EITs. 

20 

[0069] For example, Figure 3 is the example of DIT and EIT construction. Figure 3.c 
shows that two EITs have a duplicated sub-EIT, but they can't share the duplicated 
sub-EITs since the sub-EIT is in the "middle" of both EITs. But when we create a 
DIT for each EIT, we can use the DITs to replace the original EITs and let the two 
25 DITs share a single sub-EIT. 



[0070] Figure 4 is the Disjoint Graph G constructed for the set of rules S with 3 
fields given in Figure 1. G has 3 layers: 1) layer 1 contains one F^EIT constructed 
for the rule set S; 2) layer 2 contains six F 2 -DITs for associated rule sets of nodes in 
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the Fj-EJT and two F 2 -EITs constructed for associated rule sets (ARSs) of nodes in 
the six F 2 -DITs, because there are six different ARSs whose sizes are greater than 1 
in the F^EIT and two different ARSs whose sizes are greater than 1 in the six F 2 - 
DITs; 3) layer 3 contains two F 3 -DITs constructed for ARSs of nodes in the two F 2 - 
5 EITs and one F 3 -EIT constructed for ARSs of nodes in the two F 3 -DITs, because 
there are two different ARSs whose sizes are greater than 1 in the two F 2 -EIT and 
two different ARSs whose sizes are greater than 1 in the two F 3 -DITs. 

[0071] The Disjoint Graph Construction algorithm takes a rule set S with N rules 
10 and D fields as input, and returns a disjoint graph G as output. 

Input: rule set S = {R V . ..,R N }, where /?, ={F n ,F i2 ,...,F iD } , ie [IN]. 
Output: disjoint graph G . 



Disjoint Graph Construction Algorithm (S) 
15 Step 1. Construct the first layer of the disjoint graph G 

Construct an F x -EIT for integer interval F X {S) using the EITC algorithm 
Step 2. Construct the A>th layer of the disjoint graph G , k e [2,Z)] 
1. Construct a F k -DIT in the fc-th layer for each node of F k _ x -EIT in the 
(&-l)-th layer and connect the node to the root of the newly constructed 
20 F k - DIT 

a. Given a node v with an associated rule set S v of an F k _ x - EIT in the 
(k - 1) -th layer, construct a F k - DIT V , for the set of integer intervals 
F k (5 V ) using the DITC algorithm, connect v to the root of F k - DIT V . If 
5 V has only one rule, directy associate the rule to v ; 
25 b. If the associated rule set S v , of another node v' is the same as S v , then 

F k -DIT V is shared by v and V , and node v' is also connected to the 
root of F k - DIT V ; 
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c. Repeat a to c to construct F k - DIT s for all the nodes in the 
F k ^-EITs. 

2. Construct a F k - EIT in the £-th layer for each node in the F k - DITs in the 
k-th layer and connect the node to the root of the newly constructed 
F k -EIT 

a. Given a node v with an associated rule set S v of an F k -DIT in the 
k -th layer, construct an F k - EIT V , for the set of integer intervals 
F k (S v ) using the EITC algorithm, connect v to the root of F k -EIT V . If 
5 V has only one rule, directy associate the rule to v ; 

b. If the associated rule set 5 V . of another node v' is the same as S v , then 
F k -EIT V is shared by v and V , and node v' is also connected to the 
root of F k -EIT V ; 

c. Repeat a to c to construct F k - EIT s for all the nodes in the F k - DIT s. 
Repeat Step 2 until the D-th layer of the disjoint graph G is constructed. 

[0072] The Disjoint Graph Search algorithm takes a disjoint graph G constructed 
by disjoint graph construction algorithm and a packet P as inputs, and returns the 
best matching rule of P as output. 

Disjoint Graph Search Algorithm (G, P) 

The search starts from the root of the F x - EIT tree in the first layer of the G . 

Step 1. Search the F k - EIT s in the k -th layer of G , k e [1, D] 

The search performed on the node v of the F k - EIT with the associated 
rule set S v and integer interval I v = [/ v ,w v ] can be divided in three cases: 
Case 1: f k < T v 



1^ 

Perform search on the left child of v if the left child exists. If the left 
child does not exist, there is no matching rule for P in G . 
Case 2: f k > u v 

Perform, search on the right child of v if the right child exists. If the 
right child does not exist, there is no matching rule for P in G . 

Case 3: T v <f k <u v 

Perform search on F k+l -DIT V in the (fc + l)-th layer if the F M -DIT V 

exists. If F M - DIT V does not exists, the best matching rule of P is the 

rule has the smallest rule number in S v . 

Step 2. Search the F M - DIT V in the (k + 1) -th layer of G, ke [l,D-l] 

The search performed on the node v of the F k+l - DIT V with the associated 

rule set S v and integer interval I v = [/ v , u v ] can be divided in three cases: 
Casel: f M <l v 

Perform search on the left child of v if the left child exists. If the left 
child does not exist, there is no matching rule for P in G . 
Case 2: f M > u v 

Perform search on the right child of v if the right child exists. If the 
right child does not exist, there is no matching rule for P in G . 

Case 3: l v <f k+l <u v 

Perform search on F M - EIT V in the (k + 1) -th layer if the F k+] - EIT V 
exists. If F M - EIT V does not exists, the best matching rule of P is the 
rule has the smallest rule number in S v . 
[0073] The Disjoint Graph based classification algorithm requires only a single path 
to be traversed when perform the classification for a packet, thus reduces the 
searching time required by the FIS trees based classification algorithm. In addition, 



since identical EITs (DITs) are constructed only once, the building time and the 
storage space are saved. 

[0074] Also, in accordance with the invention, given a set of intervals and a point, 
the Elementary Interval Tree is presented to represent the set of intervals to 
support stabbing query, the Elementary Interval Tree Construction algorithm to 
construct the set of intervals to an elementary interval tree, and the Elementary 
Interval Tree Query algorithm to perform stabbing query on the elementary 
interval tree to find all intervals that contain a given point. 

[0075] Given a set of intervals / = {/,, I 2 ,...I n } = {[l l ,u l ] 9 [l 2 ,u 2 ],. ..,[/„, wjl / ^ set °f 
elementary intervals of / is defined as {T } , 7 2 , . . . , I K } : 

1. Put all lower bounds and upper bounds of / into an array E, 
E = {l y ,u ]9 ..J n9 u n }; 

2. Sort E in ascending order, delete duplicated elements, denote E as E = 
{e l9 ...,e K }, e x <e 2 <...<e K , 1< K<2n; 

3. I k Q[e k ,e k+l ]Ql i9 \<k<K-\, iff ( e k e U or e k+l € L ), \<i<n\ (two 
successive elementary bounds e k and e k+l define an elementary interval, 
unless the first bound e k is an upper bound and the second bound e k+l is a 
lower bound) 

4. / 1 u/ 2 u...u/ B = 7 1 u7 2 u...u7^_ 1 ; 

5. \fl a j b ,a*b,7 a nl b =0. 



[0076] For example, given a set of intervals (Figure 5) {[10, 30], [5, 35], [4, 8]}, the 
elementary intervals are {[4, 4], [5, 8], [9, 9], [10, 30], [31, 35]}. 



[0077] The Elementary Interval Tree is an augmented binary search tree that stores 
each of the elementary intervals in one node to represent a set of intervals. Each 
node in the elementary interval tree has LB, UB, Left, Right, and AIS fields, where 
LB and UB are lower and upper endpoint of an elementary interval, respectively, 
5 Left and Right are pointers to left and right subtree, respectively, and AIS 
(Associated Interval Set) is a list of identifiers of intervals that contain the 
elementary interval stored in the node. 

[0078] The Elementary Interval Tree Construction (EITC) algorithm takes a set of 
10 intervals I = [I x , I 2 /„ } as input, and returns an elementary tree EIT as output. 

[0079] Elementary Interval Tree Construction Algorithm ( / ) 
Step 1: Create the root node V for EIT 

1. Store the integer interval I v = [l v , u v ] = [l A ,u x ] in V ; 

15 2. Store the list of identifiers of intervals AIS V ={/,} in V; 

3. Remove I Y from I , I = I -I x . 
Step 2: Insert = [/,. , u { ] , / e [2, n] , to the EIT 
1. Compare I ( to I v 
Case 1: u i < l v 

20 If the left child node of V does not exist, v L = 0 , create v L , store I { in 

v L and add I t to AIS of v L . 

If v L * 0 , recursively insert /. to the left sub-EIT with the root v L . 
Case 2: l t > u v 

If the right child node of V dose not exist, v R =0, create , store / f . 
25 in and add 7 / to AIS of . 

li v R ^0, recursively insert I t to the right sub-EIT with the root . 



Case 3: /, n/ v *0 

I L = [min(/,. ,l v ), max(/ ( . , l v ) - 1] 
/j, = [minCu,. , u v ) + 1, max(w ( . , u v )] 
I v = [l v ,u v ] = [max(/,. , J v ), min(w,. , u v )] 
5 Insert integer intervals / t and / s to EIT 

If I L * 0 

If v L = 0 , create v L , store I L in v L ; 
If v L * 0 , recursively insert 7 t to the left sub-EIT with 
the root v L . 

10 If I R *0 

If v R = 0, create v R , store / s in v R ; 
If v x ?t 0, recursively insert I R to the right sub-EIT with 
the root v R . 
2. Remove from / , 1 = 1-1, 
15 Repeating Step 2 until 1 = 0 

[0080] The Elementary Interval Tree Query (EITQ) algorithm takes the elementary 
interval tree EIT constructed for a set of intervals by EITC algorithm and a point P 
as inputs, and returns a list of identifiers of intervals that contains P as output. 
20 Elementary Interval Tree Query Algorithm (EIT, P) 
Start from the root node V of EIT 
Case 1. If l v < P < u v , return AIS V ; 

Case 2. If P < l v , recursively search the left sub-EIT rooted at the left child node 
of V, v t ; 

25 Case 3. If P > u v , recursively search the right sub-EIT rooted at the right child 
node of V , v R ; 
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Case 4. If the EIT is empty, return NULL. 

[0081] The Elementary Interval Tree contains only the Range nodes in the PR-Tree, 
thus it consumes only half of the memory storage required by the PR-Tree. The 

5 Elementary Interval Tree allows dynamic insertion (Step 2 of EITC algorithm) and 
deletion, while maintaining the tree balanced as well as the PR-Tree. Any balanced 
binary tree scheme could be used to perform the tree balancing operation on 
elementary interval tree. The balanced Elementary Interval Tree keeps the 
searching time as 0(log n) and reduces the worst case insertion time to 0(nlog n), 

10 where n is the total number of intervals. 

[0082] The advantages of the Elementary Interval Tree are: 1) reduction of the 
memory storage required by the PR-Tree to half, 2) reduction of the insertion and 
deletion time compared with the PR-Tree. 

15 

[0083] Given a set of intervals S shown in Figure 5, Figure 6 is the PR-Tree 
constructed for S, and Figure 7 is the Elementary Interval Tree built for S. 

[0084] The commercial value of the Elementary Interval Tree lies in the role as 
20 solution to stabbing queries, which is a necessary element in applications such as 
IP routers. Furthermore, the extension of Elementary Interval Tree to multiple 
dimensional domains provides a solution to packet classification in IP routers. 
Classification is a very important function that is a part of applications such as 
firewall, IPsec, Quality of service. Firewall needs to classify packets based on pre- 
25 defined set of rules so that it can filter /block some flow packets from entering the 
network. IPsec needs to classify packets based on rules so that specific flow packets 
can be matched to the corresponding security policy and associations that indicate 
the security algorithms, secure keys to be applied to the flow packets. Quality of 
service needs to perform classification function on packets, so that Quality of 
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service attributes like delay bounds, packet loss bounds, bandwidth can be 
associated with the flow packets. In VPN environments, all the three applications 
of firewall, IPsec, and Quality of service may have to be applied to the edge router 
device. Hence, efficient implementation of the classification function becomes even 
5 more vital in such environments. 

[0085] Given a set of intervals, the Disjoint Interval Tree represents a set of intervals 
to facilitate queries such as stabbing query and intersection query, the Disjoint 
Interval Tree Construction algorithm transforms a set of intervals to a disjoint 
10 interval tree and thus to find the disjoint intervals for the set of intervals, the 
Disjoint Interval Tree Point Query algorithm performs stabbing query on the 
disjoint interval tree, and the Disjoint Interval Tree Interval Query algorithm 
performs intersection query on the disjoint interval tree. 

15 [0086] Given a set of intervals, / = {I x J 2 ,..l n }, the set of disjoint intervals of / is 
defined as {I A J 2 ,...J L }, 

1. I x u/ 2 u...u/ n =/j u/ 2 u...u/ L ; 

2. Vl a J b ,a*b,I a ni b =0; 

3. V/, , /, = /; u ... u I K , I\ e {I x , I 2 /„ } , 1 < k < K ; 

20 4. v/,.,a/V, v/;, i b *i a j t <ti b . 

[0087] The disjoint intervals combine the overlapping intervals in the set of 
intervals together to form larger intervals that are disjoint to each other. For 
example, given a set of intervals {[10, 30], [5, 35], [0, 3], [4, 8], [49, 50]} (Figure 8), 
25 the disjoint intervals are {[0, 3], [4, 35], [49, 50]}. 
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[0088] The Disjoint Interval Tree is a binary search tree that stores each of the 
disjoint intervals in one node to represent a set of intervals. Each node in the 
disjoint interval tree has LB, UB, Left, Right, and AIS fields, where LB and UB are 
lower and upper endpoints of a disjoint interval, respectively, Left and Right are 
5 pointers to left and right subtree, respectively, and AIS (Associated Interval Set) is 
a list of identifiers of intervals that is contained by the disjoint interval stored in the 
node. 

[0089] The Disjoint Interval Tree Construction (DITC) algorithm takes a set of 
10 intervals I = {I l9 1 2 > -Jn 1 as input an d returns a disjoint interval tree DIT as output. 
Disjoint Interval Tree Construction Algorithm ( / ) 
Step 1: Create the root node V for DIT 

1 . Store the integer interval I x = [l x , u x ] in V , I v = [l v , u v ] = [l x , u } ] 

2. Store the list of identifiers of intervals AIS V = {I^} in V ; 
15 3. Remove I { from / , / = / - I x 

Step 2: Insert /, = [Z„u,], i€ [2,n], to the DIT 
1. Compare and I v 
Case 1: w. < l v . 

If the left child node of V does not exist, v L = 0 , create v L , store /, in 
20 v L and add to AIS of v L . 

If v L * 0 , recursively insert 1 { to the left sub-DIT with the root v L . 
Case 2: >u v . 

If the right child node of V does not exist, v R =0, create v R , store /. 
in and add I { to AIS of . 
25 If & 0 , recursively insert /. to the right sub-DIT with the root . 

Case 3: 7 f . nl y ±0. 
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If /. < l v and there exist children lev on the left of v that verify the 
condition u lcv > Z f and leftmostcv is one of these children that is most to the 
left; then 1) discard these children; 2) set l v = l leftmostlcv \ 3) connect 7 v to the 
leftover DIT sub-tree on the left 

5 

If < ly and there are no children lev on the left of v that verify the 
condition u lcv > /. , then set l v = li 

If u i > Uy and there exist children rev on the right of v that verify the 
10 condition l rcv < u i , and rightmostcv is one of these children that is most to the 

right; thenl) discard these children; 2) set u v = u rightmostcv ; 3) connect I v to 
the leftover DIT sub-tree on the right 

If u i > u v and there are no children rev on the right of v that verify the 
condition l rcv < u i , then set u v = u t 

15 

2. Remove /, from / , 1 = 1-1; 
Repeat Step 2 until 1 = 0 

20 [0090] The Disjoint Interval Tree Point Query (DITPQ) algorithm takes the disjoint 
interval tree DIT constructed for a set of intervals by DITC algorithm and a point P 
as inputs, and returns a list of identifiers of intervals that might contain P as 
output. 
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[0091] Disjoint Interval Tree Point Query Algorithm (DIT, P) 
Start from the root node V of DIT 

Case 1. If l y < P < u v , return AIS V ; 
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Case 2. If P < l v , recursively search the left sub-DIT rooted at the left child node 
of V, v L ; 

Case 3. If P>u v , recursively search the right sub-DIT rooted at the right child 
node of V , v R ; 

Case 4. If the DIT is empty, return NULL. 

[0092] The Disjoint Interval Tree Interval Query (DITIQ) algorithm takes the 
disjoint interval tree DIT constructed for a set of intervals by DITC algorithm and 
an interval [Z, u] as inputs, and returns a list of identifiers of intervals that might 
overlap [Z, u] as output. 

[0093] Disjoint Interval Tree Interval Query Algorithm (DIT, I, u) 
Start from the root node V of DIT 

Case 1. If [Z,u]n[/y,M*0, return AIS V ; 

Case 2. If u < l v , recursively search the left sub-DIT rooted at the left child node 
of V, v L ; 

Case 3. If l>u v , recursively search the right sub-DIT rooted at the right child 
node of V , v R ; 

Case 4. If the DIT is empty, return NULL. 

[0094] The Disjoint Interval Tree allows dynamic insertion while the closest prior 
art solution proposed in U.S. Patent No. 6,539,394 does not support dynamic 
insertion as shown in next paragraph. And the disjoint interval tree is able to 
maintain balance by any balanced binary tree scheme. 

[0095] The disjoint interval tree can be used with other data structures such as 
Elementary Interval Tree to form a data structure to support intersection query, 
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stabbing query, packet classification, etc. For example, after constructing the 
disjoint interval tree, it is possible to construct an elementary interval tree for each 
associated rule set in the disjoint interval tree. The data structure formed by 
balanced disjoint interval tree and balanced elementary interval trees takes 0(log 

5 n) time for intersection query or stabbing query. To find all intervals that overlap a 
given interval, DITIQ algorithm could be used to find the set of intervals that are 
possible to overlap the given interval and then the set of intervals that overlap the 
given interval can be quickly found in the small size interval set. Similarly, to find 
all intervals that contain a given point, DITPQ algorithm could be used to find the 

10 set of intervals that are possible to contain the given point. 

[0096] Here the differences of the DIT as compared with the method proposed in 
U.S. Patent No. 6,539,394 are apparent. The method is a static algorithm that needs 
to reconstruct the decision tree when inserts or deletes an interval from the interval 
15 set. 

[0097] Given a set of intervals / = {7,,/ 2 ,.../J = {[/ ]j m 1 ],[/ 2 ,m 2 ],...,[/ m ,mJ}, the 
method performs the following operations: 

1) Puts all lower endpoints {/,,/ 2 ,. ..,/„} to an array, sorts them in ascending 
20 order and deletes duplicated elements to result a set of endpoints 

{le l Je 2 ,...,le i } , i<n , and uses the set of endpoints to form a set of intervals 
LE = {[0,/e 1 ),[/e 1 ,/e 2 ),[/e 2 ,/e 3 ) v ..[/e / ,max)}, where IL£I=/ + 1 and max is the 
maximum possible. For example, given a set of intervals {[1,3], [4,5], [2,8]}, 
we get the interval set LE = {[0,l),[l,2),[2,4),[4,max)}; 
25 2) Performs the same operation on the upper endpoints {u x , u 2 u n } to get a set 
of intervals UE = {(0,01,(0,^], (we,, ue 2 ] 9 ... 9 (ue j9 max]}, where j<n, and 
lt/£l=; + 2; 



3) For the interval set LE , uses w x = flog I LE f| bits to label each interval of the 
interval set starting from all O's for the first interval For example, intervals 
in {[0,l),[l,2),[2,4),[4,max)} will be labeled as 00 for [0,1), 01 for [1,2), 10 for 
[2,4), and 11 for [4, max); 

4) Labels each interval in the interval set UE using w 2 = [log I UE \] bits; 

5) Builds a n x (w^w^ matrix M for the set of intervals / , one row for each 
interval and (w^w^ elements for each row: 1) gets the bit labels of all 
intervals in LE that are contained by the interval, keeps the common bits of 
these intervals and set other bits to wildcard * to get a w 1 bits prefix, and 2) 
gets a w 2 bits prefix for the interval similarly based on the interval set UE. 
For example, the interval [2, 8] contains intervals [2,4), [4, max) in LE that 
are labeled as 10 and 11 respectively, which results 1*; 

6) Constructs a decision tree based on the matrix M: 

a) Choose the column having a minimal number of wildcards and if 

more than one such column, choose the lowest index column having 

the closest equal number of 'l's and '0's, and this column will be the 

first node of the decision tree; 
\ b) Derives two matrices from M by eliminating the rows having '0's and 

'l's, respectively, in the selected column, and by eliminating the 

selected column from the new matrices; 
c) Recursively selects columns from the matrices and creates nodes until 

the decision tree is built, that the given intervals are distinguished 

from each other. 



[0098] An example of the decision tree built for the set of intervals in Figure 8 is 
given in Figure 11. 



[0099] Although particular embodiments of the invention have been described and 
illustrated it will be apparent to one skilled in the art that numerous changes can 
be made without departing from the basic concepts. For example, the treelike data 
structures for creating the disjoint graph, as well as the EIT and DIT can be stored 
5 on a computer readable medium for packet classification. It is to be understood, 
however, that such changes will fall within the full scope of the invention as 
defined by the appended claims. 



